This is the tutorial for the R primer workshop. I will go through with you some R basics and good practices for data exploration, data visualization, and spatial data manipulation using R.
We will enjoy three sections (use the left panel to navigate)
— R basics and beyond
— Data visualization with ggplot2
— Geospatial data analysis
Notice
workshop.R file # this section will automatically install the missing packages on your own computer
# list the packages that we need
packages_need <- c("data.table", # read in csv file and doing data analysis
"raster", # analyze raster data
"sf", # analyze vector data
"ggplot2", # data visualization
"ggspatial", # plot raster map
"RColorBrewer", # give me some nice color palettes
"scales",
"cowplot",
"shiny", # interactive data visualization
"shinyjs", # interactive data visualization
"leaflet" # interactive data visualization
)
# What the code below does is to check if your R has these packages installed (the require() function), if not, install them (the install.packages() function), and then load the packages using the library() function.
for (package in packages_need){
if (!require(package, character.only = TRUE)){
install.packages(package)
}
library(package, character.only = TRUE)
}
Good style is important because while your code only has one author, it’ll usually have multiple readers. This is especially true when you’re writing code with others. Learn more about the good style, click Here
File names should be meaningful and end in .R.
# Good
fit-models.R
utility-functions.R
# Bad
foo.r
stuff.r
If files need to be run in sequence, prefix them with numbers.
0-download.R
1-parse.R
2-explore.R
The variable name generally starts with letter and can contain number, letter, underscore(_) and period(.). Example: day_one, day.one
Reserved words or keywords (such as TRUE, FALSE, function, if, else, break,for etc.) are not allowed
Special characters such as ‘#’, ‘&’, etc., along with White space (tabs, space) are not allowed
One more note about variables: R is a case-sensitive language. So, variable x is not the same as X
# Good (shortcut for assignment operator: Option + Minus on MAC and Alt + Minus on Windows & Linux)
x <- 5
# Bad
x = 5
# Good
sample(x = 1:10, size = 5)
To learn more about the differences between <- and =, click Here
Strive for names that are concise and meaningful (this is not easy!).
# Good
day_one
day_1
# Bad
first_day_of_the_month
dayone
djm1
Where possible, avoid using names of existing functions and variables. Doing so will cause confusion for the readers of your code.
# Bad
T <- FALSE
c <- 10
mean <- function(x) sum(x)
When using R as a calculator, the order of operations is the same as you would have learned back in school. From highest to lowest precedence:
(, )^ or *** and /+ and -# First, let's set up our working directory
# An addition (short cut for running selected code: Cmd + Return on Mac and Ctrl + Enter on Windows & Linux)
5 + 5
## [1] 10
# A subtraction
5 - 5
## [1] 0
# A multiplication
3 * 5
## [1] 15
# A division
9 / 2
## [1] 4.5
# integer division
9 %/% 2
## [1] 4
# x mod y (“x modulo y”)
9 %% 2
## [1] 1
# An Exponentiation
3^2
## [1] 9
# A root extraction
4^(0.5)
## [1] 2
# make if fancy
3 * (4/5 + 2^2)
## [1] 14.4
# assign the value to a variable
x <- 5
R works with many data types. Some of the most basic types to get started are:
numericnumericlogicalcharacter# Declare variables of different types
my_integer <- 1
my_decimal <- 1.5
my_character <- "REU student"
my_logical <- TRUE
# Check class of my_integer and my_decimal
class(my_integer)
## [1] "numeric"
class(my_decimal)
## [1] "numeric"
class(my_character)
## [1] "character"
class(my_logical)
## [1] "logical"
# OK, how about some real integer types
# The advantage in using real integer types is that it can save you some storage memory in your computer
# But in most cases, you don't need to care!
my_real_integer <- 1L
class(my_real_integer)
## [1] "integer"
Notice
# Shortcut for inserting pipe operator: Cmd + Shift + M on MAC and Ctrl + Shift + M on Windows & Linux)
# vector
# "c" means concatenate
v_num <- c(1, 2, 3, 4, 5)
v_char <- c("a", "b", "c")
v_logical <- c(TRUE, FALSE, FALSE)
# test different data types in a vector
# 1 and 2 will be automatically converted to character data type
v_test <- c(1, 2, "a")
# check the length of the vector
length(v_num)
## [1] 5
# select some values from the vector
v_num[1]
## [1] 1
v_num[1:3]
## [1] 1 2 3
v_num[c(1, 5)]
## [1] 1 5
v_num[-1]
## [1] 2 3 4 5
v_num > 2
## [1] FALSE FALSE TRUE TRUE TRUE
v_num[v_num > 2]
## [1] 3 4 5
# give some names to the vector
names(v_num) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
# select based on name
v_num["Monday"]
## Monday
## 1
# sum up two vectors
v_a <- c(1, 2, 3, 4, 5)
v_b <- c(2, 4, 6, 8, 10)
v_total <- v_a + v_b
# concatenate two vectors
v_join <- c(v_a, v_b)
# do some basic statistics on vector
# mean value
mean(v_a)
## [1] 3
# standard deviation
sd(v_a)
## [1] 1.581139
# quickly create a vector
# create a vector sequence
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
# create a vector sequence with 2 as the step
seq(1, 10, by = 2)
## [1] 1 3 5 7 9
# create a vector with 5 replicates
rep(2, times = 5)
## [1] 2 2 2 2 2
# create a vector with lower-case or captial letters
letters[1:10]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
LETTERS[1:10]
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
# create a long vector
v_long <- 1:1000
# use head and tail to show the first or last several elements in the vector
head(v_long)
## [1] 1 2 3 4 5 6
tail(v_long)
## [1] 995 996 997 998 999 1000
# Shortcut for inserting pipe operator: Cmd + Shift + M on MAC and Ctrl + Shift + M on Windows & Linux)
# unordered (nominal categorical variable)
v_fruit <- c("apple", "banana", "orange", "peach")
f_fruit <- factor(v_fruit)
f_fruit[1] > f_fruit[2]
## Warning in Ops.factor(f_fruit[1], f_fruit[2]): '>' not meaningful for factors
## [1] NA
# ordered (ordinal categorical variable)
v_temp <- c("High", "Low", "High","Low", "Medium")
f_temp <- factor(v_temp)
f_temp <- factor(v_temp, levels = c("Low", "Medium", "High"), ordered = TRUE)
f_temp[1] > f_temp[2]
## [1] TRUE
v_a <- 1:6
# create a 2*3 matrix
m_a <- matrix(v_a, nrow = 2, ncol = 3)
m_a
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
matrix(v_a, nrow = 2)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
# force it to expand along the row
matrix(v_a, nrow = 2, byrow = TRUE)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
# check the total dimension
dim(m_a)
## [1] 2 3
# check how many rows
nrow(m_a)
## [1] 2
# check how many columns
ncol(m_a)
## [1] 3
# select specific row
m_a[1, ]
## [1] 1 3 5
# select specific columns
m_a[, 1]
## [1] 1 2
# let's display m_a again
m_a
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
# calculate the totals for each row of a matrix
rowSums(m_a)
## [1] 9 12
# calculate the totals for each column of a matrix
colSums(m_a)
## [1] 3 7 11
# calculate the mean for each row of a matrix
rowMeans(m_a)
## [1] 3 4
# calculate the sd for each row or each column
# unfortunately, we don't have a built-in rowSds...
# we use the apply function
# Margin = 1 means along the row (row-wise)
# Margin = 1 2 means along the column (column_wise)
apply(m_a, MARGIN = 1, FUN = sd)
## [1] 2 2
apply(m_a, MARGIN = 2, FUN = sd)
## [1] 0.7071068 0.7071068 0.7071068
# check if the apply is right
sd(m_a[1, ])
## [1] 2
sd(m_a[, 1])
## [1] 0.7071068
We use the mpg data set that is shipped with the ggplot2 package (we already loaded ggplot2 so we can directly access the data). mpgprovides fuel economy data from 1999 and 2008 for 38 popular models of cars.
dim(mpg) # dimension
## [1] 234 11
# as you can see, each column in data.frame has the same data type, while different columns can have different data types
str(mpg) # data structure
## tibble [234 × 11] (S3: tbl_df/tbl/data.frame)
## $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
## $ model : chr [1:234] "a4" "a4" "a4" "a4" ...
## $ displ : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr [1:234] "f" "f" "f" "f" ...
## $ cty : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr [1:234] "p" "p" "p" "p" ...
## $ class : chr [1:234] "compact" "compact" "compact" "compact" ...
summary(mpg) # data statistics
## manufacturer model displ year
## Length:234 Length:234 Min. :1.600 Min. :1999
## Class :character Class :character 1st Qu.:2.400 1st Qu.:1999
## Mode :character Mode :character Median :3.300 Median :2004
## Mean :3.472 Mean :2004
## 3rd Qu.:4.600 3rd Qu.:2008
## Max. :7.000 Max. :2008
## cyl trans drv cty
## Min. :4.000 Length:234 Length:234 Min. : 9.00
## 1st Qu.:4.000 Class :character Class :character 1st Qu.:14.00
## Median :6.000 Mode :character Mode :character Median :17.00
## Mean :5.889 Mean :16.86
## 3rd Qu.:8.000 3rd Qu.:19.00
## Max. :8.000 Max. :35.00
## hwy fl class
## Min. :12.00 Length:234 Length:234
## 1st Qu.:18.00 Class :character Class :character
## Median :24.00 Mode :character Mode :character
## Mean :23.44
## 3rd Qu.:27.00
## Max. :44.00
head(mpg) # first several rows
## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
tail(mpg) # last several rows
## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 volkswagen passat 1.8 1999 4 auto(l5) f 18 29 p midsi…
## 2 volkswagen passat 2 2008 4 auto(s6) f 19 28 p midsi…
## 3 volkswagen passat 2 2008 4 manual(m… f 21 29 p midsi…
## 4 volkswagen passat 2.8 1999 6 auto(l5) f 16 26 p midsi…
## 5 volkswagen passat 2.8 1999 6 manual(m… f 18 26 p midsi…
## 6 volkswagen passat 3.6 2008 6 auto(s6) f 17 26 p midsi…
# we convert the data.frame to data.table (a special type of data.frame that we will use for all the data manipulation)
dt_mpg <- as.data.table(mpg)
# select elements of data.frame by position
dt_mpg[1, ]
## manufacturer model displ year cyl trans drv cty hwy fl class
## 1: audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
dt_mpg[, 1]
## manufacturer
## 1: audi
## 2: audi
## 3: audi
## 4: audi
## 5: audi
## ---
## 230: volkswagen
## 231: volkswagen
## 232: volkswagen
## 233: volkswagen
## 234: volkswagen
dt_mpg[1:2, ]
## manufacturer model displ year cyl trans drv cty hwy fl class
## 1: audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
## 2: audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
dt_mpg[1:2, c(2,3)]
## model displ
## 1: a4 1.8
## 2: a4 1.8
# extract 1 column as a vector
head(dt_mpg[[1]])
## [1] "audi" "audi" "audi" "audi" "audi" "audi"
# select elements of data.frame by name
# the .() operator keeps the output as data.table
dt_mpg[1:5, .(year)]
## year
## 1: 1999
## 2: 1999
## 3: 2008
## 4: 2008
## 5: 1999
dt_mpg[1:5, .(model, year)]
## model year
## 1: a4 1999
## 2: a4 1999
## 3: a4 2008
## 4: a4 2008
## 5: a4 1999
dt_mpg[1:5, .(model, year)]
## model year
## 1: a4 1999
## 2: a4 1999
## 3: a4 2008
## 4: a4 2008
## 5: a4 1999
# extract 1 column as a vector
dt_mpg[1:5, year]
## [1] 1999 1999 2008 2008 1999
# or you can use the dollar sign
head(dt_mpg$year)
## [1] 1999 1999 2008 2008 1999 1999
# select elements of data.frame by logical vector
head(dt_mpg$year > 2005)
## [1] FALSE FALSE TRUE TRUE FALSE FALSE
head(dt_mpg[dt_mpg$year > 2005, ])
## manufacturer model displ year cyl trans drv cty hwy fl class
## 1: audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
## 2: audi a4 2.0 2008 4 auto(av) f 21 30 p compact
## 3: audi a4 3.1 2008 6 auto(av) f 18 27 p compact
## 4: audi a4 quattro 2.0 2008 4 manual(m6) 4 20 28 p compact
## 5: audi a4 quattro 2.0 2008 4 auto(s6) 4 19 27 p compact
## 6: audi a4 quattro 3.1 2008 6 auto(s6) 4 17 25 p compact
# since we are using data.table, why not use the tidy way?
head(dt_mpg[year > 2005, ])
## manufacturer model displ year cyl trans drv cty hwy fl class
## 1: audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
## 2: audi a4 2.0 2008 4 auto(av) f 21 30 p compact
## 3: audi a4 3.1 2008 6 auto(av) f 18 27 p compact
## 4: audi a4 quattro 2.0 2008 4 manual(m6) 4 20 28 p compact
## 5: audi a4 quattro 2.0 2008 4 auto(s6) 4 19 27 p compact
## 6: audi a4 quattro 3.1 2008 6 auto(s6) 4 17 25 p compact
# calculate the average mileage of highway and city for each single row
head(dt_mpg[, mileage_mean := (cty + hwy)/2])
## manufacturer model displ year cyl trans drv cty hwy fl class
## 1: audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
## 2: audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
## 3: audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
## 4: audi a4 2.0 2008 4 auto(av) f 21 30 p compact
## 5: audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
## 6: audi a4 2.8 1999 6 manual(m5) f 18 26 p compact
## mileage_mean
## 1: 23.5
## 2: 25.0
## 3: 25.5
## 4: 25.5
## 5: 21.0
## 6: 22.0
# or a fancy way
head(dt_mpg[, mileage_mean := rowMeans(.SD), .SDcols = c("cty", "hwy")])
## manufacturer model displ year cyl trans drv cty hwy fl class
## 1: audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
## 2: audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
## 3: audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
## 4: audi a4 2.0 2008 4 auto(av) f 21 30 p compact
## 5: audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
## 6: audi a4 2.8 1999 6 manual(m5) f 18 26 p compact
## mileage_mean
## 1: 23.5
## 2: 25.0
## 3: 25.5
## 4: 25.5
## 5: 21.0
## 6: 22.0
# calculate average highway mileage for all car models
dt_mpg[, mean(hwy)]
## [1] 23.44017
# But if we want to keep the result as data.table, we can use the .() operator
dt_mpg[, .(hwy_mean = mean(hwy))]
## hwy_mean
## 1: 23.44017
# calculate average highway mileage for different classes
dt_mpg[, .(hwy_mean = mean(hwy)), by = class]
## class hwy_mean
## 1: compact 28.29787
## 2: midsize 27.29268
## 3: suv 18.12903
## 4: 2seater 24.80000
## 5: minivan 22.36364
## 6: pickup 16.87879
## 7: subcompact 28.14286
# order the average highway mileage of different classes
dt_mpg[, .(hwy_mean = mean(hwy)), by = class][order(hwy_mean)]
## class hwy_mean
## 1: pickup 16.87879
## 2: suv 18.12903
## 3: minivan 22.36364
## 4: 2seater 24.80000
## 5: midsize 27.29268
## 6: subcompact 28.14286
## 7: compact 28.29787
# order the average highway mileage of different classes (descending)
dt_mpg[, .(hwy_mean = mean(hwy)), by = class][order(-hwy_mean)]
## class hwy_mean
## 1: compact 28.29787
## 2: subcompact 28.14286
## 3: midsize 27.29268
## 4: 2seater 24.80000
## 5: minivan 22.36364
## 6: suv 18.12903
## 7: pickup 16.87879
# drop one column
dt_mpg[, year := NULL]
# drop several columns
dt_mpg[, c("drv", "cyl") := NULL]
dt_mpg
## manufacturer model displ trans cty hwy fl class mileage_mean
## 1: audi a4 1.8 auto(l5) 18 29 p compact 23.5
## 2: audi a4 1.8 manual(m5) 21 29 p compact 25.0
## 3: audi a4 2.0 manual(m6) 20 31 p compact 25.5
## 4: audi a4 2.0 auto(av) 21 30 p compact 25.5
## 5: audi a4 2.8 auto(l5) 16 26 p compact 21.0
## ---
## 230: volkswagen passat 2.0 auto(s6) 19 28 p midsize 23.5
## 231: volkswagen passat 2.0 manual(m6) 21 29 p midsize 25.0
## 232: volkswagen passat 2.8 auto(l5) 16 26 p midsize 21.0
## 233: volkswagen passat 2.8 manual(m5) 18 26 p midsize 22.0
## 234: volkswagen passat 3.6 auto(s6) 17 26 p midsize 21.5
Every ggplot2 plot has three key components:
Note that when using rgb() in R, 0-255 needs to be scaled to 0-1.
ggplot()
ggplot(data = mpg)
ggplot(data = mpg, aes(x=displ, y=cty))
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point()
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point(shape=21, size=3, fill="red", color="black")
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point(shape=21, size=3, fill="red", color="black", stroke=0.1, alpha = 0.7)
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point(shape=21, size=3, fill="red", color="black", stroke=0.1, alpha = 0.7) +
theme_bw()
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point(shape=21, size=3, fill="red", color="black", stroke=0.1, alpha = 0.7) +
theme_classic()
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point(shape=21, size=3, fill="red", color="black", stroke=0.1, alpha = 0.7) +
theme_minimal()
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point(shape=21, size=3, fill="red", color="black", stroke=0.1, alpha = 0.7) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank()
)
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point(shape=21, size=3, fill="red", color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank()
)
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point(shape=21, size=3, fill="red", color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14)
)
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point(shape=21, size=3, fill="red", color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10))
)
ggplot(data = mpg, aes(x=displ, y=cty, fill=class)) +
geom_point(shape=21, size=3, color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10))
)
# use the "Accent" palette
ggplot(data = mpg, aes(x=displ, y=cty, fill=class)) +
geom_point(shape=21, size=3, color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
scale_fill_brewer(palette = "Accent") +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10))
)
# use the "Dark2" palette
ggplot(data = mpg, aes(x=displ, y=cty, fill=class)) +
geom_point(shape=21, size=3, color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
scale_fill_brewer(palette = "Dark2") +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10))
)
### Alternative to RColorBrewer palette
color_manual <- c("#001219", "#0a9396", "#94d2bd", "#e9d8a6", "#ee9b00", "#ca6702", "#9b2226")
# use show_col from the scales package to see your color
show_col(color_manual)
# use the "Accent" palette
ggplot(data = mpg, aes(x=displ, y=cty, fill=class)) +
geom_point(shape=21, size=3, color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
scale_fill_manual(values=color_manual) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10))
)
# use the "Accent" palette
ggplot(data = mpg, aes(x=displ, y=cty, fill=class)) +
geom_point(shape=21, size=3, color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
scale_fill_brewer(palette = "Accent") +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
legend.position = "bottom",
legend.background = element_rect(color = "gray", size = 0.5),
legend.title = element_text(size=12),
legend.text = element_text(size=10)
)
# use the "Accent" palette
ggplot(data = mpg, aes(x=displ, y=cty, fill=class)) +
geom_point(shape=21, size=3, color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
scale_fill_brewer(palette = "Accent") +
guides(fill = guide_legend(nrow = 1)) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
legend.position = "bottom",
legend.background = element_rect(color = "gray", size = 0.5),
legend.title = element_text(size=12),
legend.text = element_text(size=10)
)
# use the "Accent" palette
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point(shape=21, size=3, fill = "red", color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
facet_wrap(vars(class)) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10))
)
# use the "Accent" palette
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point(shape=21, size=3, fill = "red", color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
facet_wrap(vars(class)) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
strip.text = element_text(size = 12)
)
# use the "Accent" palette
ggplot(data = mpg, aes(x=displ, y=cty)) +
geom_point(shape=21, size=2, fill = "red", color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
geom_smooth(method = lm, se = TRUE, color="#999999", size = 0.5) +
facet_wrap(vars(class)) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
strip.text = element_text(size = 12)
)
## `geom_smooth()` using formula 'y ~ x'
ggplot(data = mpg, aes(x=displ, y=cty, fill=hwy)) +
geom_point(shape=21, size=3, color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
scale_fill_viridis_c(option = "magma") +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10))
)
# change the legend aesthetics
range(mpg$hwy)
## [1] 12 44
ggplot(data = mpg, aes(x=displ, y=cty, fill=hwy)) +
geom_point(shape=21, size=3, color="black", stroke=0.1, alpha = 0.7) +
labs(x="engine displacement in liters",
y = "city mileage") +
scale_fill_continuous(type = "viridis",
option = "magma",
limits = c(10, 45), # limits are needed to make the labels work properly
breaks = seq(10, 45, by = 5),
labels = seq(10, 45, by = 5)) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
legend.title = element_text(size=14, vjust = 5),
legend.text = element_text(size=12),
legend.key.height = unit(15, units = "mm"),
legend.key.width = unit(3, units = "mm")
)
# Bar plot: how many cars are in each class
ggplot(data = mpg) +
geom_bar(aes(x=class)) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
)
# Bar plot: let's flip the x-y axis
ggplot(data = mpg) +
geom_bar(aes(x=class)) +
coord_flip() +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
)
# Boxplot of hwy in each class
ggplot(data = mpg) +
geom_boxplot(aes(x=class, y=hwy)) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
)
# Boxplot of hwy in each class and in each cylinder
ggplot(data = mpg) +
geom_boxplot(aes(x=class, y=hwy, fill=as.factor(cyl)), size = 0.2) +
coord_flip() +
scale_fill_brewer(name="Number of cylinders", palette = "Accent") +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
)
# Histograms of hwy
ggplot(data = mpg) +
geom_histogram(aes(x=hwy)) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Histograms of hwy in each class
ggplot(data = mpg) +
geom_histogram(aes(x=hwy, fill=class), bins = 20, position = "dodge") +
scale_fill_brewer(palette = "Accent") +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
)
# Density of hwy
ggplot(data = mpg) +
geom_density(aes(x=hwy), size=0.2) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
)
# Density of hwy in each class
# we can clearly see the differences between small and big cars
ggplot(data = mpg) +
geom_density(aes(x=hwy, fill=class), size=0.2, alpha = 0.5) +
scale_fill_brewer(palette = "Accent") +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
)
# OK, let's save our figure
ggsave(filename = "density_plot.pdf", width = 6, height = 4)
Notice
Geographic vs Projected coordinate reference systems, projection transformation, and raster vs. vector format, but let’s forget about all of these and start with a real example# learn more about time counting here: https://www.alexejgossmann.com/benchmarking_r/
# much faster using fread from data.table
system.time(dt_temp <- fread(file.path("data", "gcoos_2020_01_air_temperature.csv")))
## user system elapsed
## 0.041 0.003 0.045
# slower when reading csv files using the built-in read.csv function
system.time(df_test <- read.csv(file.path("data", "gcoos_2020_01_air_temperature.csv")))
## user system elapsed
## 0.278 0.009 0.287
# use rmarkdown to make the table pretty
rmarkdown::paged_table(dt_temp)
# check the data
str(dt_temp)
## Classes 'data.table' and 'data.frame': 175044 obs. of 6 variables:
## $ network : chr "ADCP" "CenGOOS" "LUMCON" "LUMCON" ...
## $ platform : chr "ioos:station:wmo:42395" "ioos:station:wmo:42067" "ioos:station:LUMCON:102" "ioos:station:LUMCON:wisl1" ...
## $ latitude : num 26.4 30 29.2 29.1 28.9 ...
## $ longitude : num -90.8 -88.6 -90.6 -90.2 -89.4 ...
## $ date : POSIXct, format: "2020-01-01 00:00:00" "2020-01-01 00:00:00" ...
## $ air_temperature: num 17.9 13.9 13.7 50.2 -9999 ...
## - attr(*, ".internal.selfref")=<externalptr>
ggplot(dt_temp) +
geom_histogram(aes(x=air_temperature))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1 rows containing non-finite values (stat_bin).
quantile(dt_temp$air_temperature, prob = seq(0, 1, 0.01), na.rm = TRUE)
## 0% 1% 2% 3% 4% 5% 6%
## -9999.0000 -9999.0000 -9999.0000 -9999.0000 -9999.0000 -9999.0000 -9999.0000
## 7% 8% 9% 10% 11% 12% 13%
## -9999.0000 -9999.0000 5.0000 6.7000 7.9000 8.8000 9.4000
## 14% 15% 16% 17% 18% 19% 20%
## 9.9000 10.4000 10.7000 11.1000 11.4000 11.6000 11.9000
## 21% 22% 23% 24% 25% 26% 27%
## 12.1000 12.4000 12.6000 12.8000 13.0000 13.2000 13.5000
## 28% 29% 30% 31% 32% 33% 34%
## 13.7000 13.9000 14.1000 14.3000 14.5644 14.7000 14.9000
## 35% 36% 37% 38% 39% 40% 41%
## 15.1000 15.3000 15.4000 15.6000 15.8000 15.9000 16.1000
## 42% 43% 44% 45% 46% 47% 48%
## 16.3000 16.4000 16.6000 16.7000 16.9000 17.1000 17.2000
## 49% 50% 51% 52% 53% 54% 55%
## 17.4000 17.6000 17.7000 17.9000 18.1000 18.2000 18.4000
## 56% 57% 58% 59% 60% 61% 62%
## 18.5000 18.7000 18.8000 19.0000 19.1000 19.2000 19.3000
## 63% 64% 65% 66% 67% 68% 69%
## 19.5000 19.6000 19.7000 19.9000 20.0000 20.1000 20.3000
## 70% 71% 72% 73% 74% 75% 76%
## 20.4000 20.6000 20.7000 20.9000 21.1000 21.3000 21.4000
## 77% 78% 79% 80% 81% 82% 83%
## 21.6000 21.8000 22.0000 22.2000 22.4000 22.6000 22.8000
## 84% 85% 86% 87% 88% 89% 90%
## 23.1000 23.3000 23.6000 23.8000 24.2000 24.5000 24.8000
## 91% 92% 93% 94% 95% 96% 97%
## 25.2000 25.6000 25.9000 26.3000 26.7000 27.2000 27.6000
## 98% 99% 100%
## 28.1000 73.3000 93.4000
# filter out the bad ones
dt_temp <- dt_temp[air_temperature > -20 & air_temperature < 40]
ggplot(dt_temp) +
geom_histogram(aes(x=air_temperature))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# average air temperature by platform
dt_mean <- dt_temp[,
.(temp = mean(air_temperature),
network = first(network),
lat = first(latitude),
lon = first(longitude)),
by = platform]
dt_mean
## platform temp network lat lon
## 1: ioos:station:wmo:42395 21.05588 ADCP 26.4040 -90.7920
## 2: ioos:station:wmo:42067 14.63252 CenGOOS 30.0430 -88.6490
## 3: ioos:station:LUMCON:102 15.20056 LUMCON 29.1870 -90.6093
## 4: ioos:station:NOAA.NDBC:CDRF1 15.18620 NDBC 29.1360 -83.0290
## 5: ioos:station:NOAA.NDBC:FWYF1 22.08396 NDBC 25.5910 -80.0970
## ---
## 113: ioos:station:NOAA.NOS.CO-OPS:8776139 17.41861 NOS 27.4800 -97.3220
## 114: ioos:station:LUMCON:wisl1 37.98065 LUMCON 29.1144 -90.1840
## 115: ioos:station:WAVCIS:CSI06 16.89686 WAVCIS 28.8667 -90.4833
## 116: ioos:station:DISL:MBLA 12.52477 DISL 30.4367 -88.0117
## 117: ioos:station:USF.COMPS:C10 17.33979 COMPS 27.1690 -82.9260
# let's plot our data
ggplot(data = dt_mean, aes(x=lon, y=lat, fill=temp)) +
geom_point(shape=21, size=3, color="black", stroke=0.1, alpha = 0.7) +
labs(x = "Longitude",
y = "Latitude") +
scale_fill_continuous(name = "Temperature (Celsius)",
type = "viridis",
option = "magma") +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 16),
axis.text = element_text(size = 14),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
legend.title = element_text(size=12, vjust = 5),
legend.text = element_text(size=10),
legend.key.height = unit(15, units = "mm"),
legend.key.width = unit(3, units = "mm")
)
Although we can plot our temperature as above, we are not plotting it in the geographic context!
Coordinate System is the most general term for a system that includes coordinates. It includes two common types:
A geographic coordinate system (GCS) is a reference framework that defines the locations of features on a model of the earth. It’s shaped like a globe—spherical. Its units are angular, usually degrees.
A projected coordinate system (PCS) is flat. It contains a GCS, but it converts that GCS into a flat surface, using math (the projection algorithm) and other parameters. Its units are linear, most commonly in meters.
There are two fundamental types of spatial objects: vectors and rasters. The main difference between Raster and Vector Data is that the raster data represents data as a cell or a grid matrix while vector data represents data using sequential points or vertices. Raster data can be made from vector data and vice versa depending on the goals of an analysis. Learn more
# convert data into a spatial vector
sf_temp <- st_as_sf(dt_mean, coords = c("lon", "lat"), crs = 4326)
# check the data
sf_temp
## Simple feature collection with 117 features and 3 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -97.4706 ymin: 17.002 xmax: -80.0333 ymax: 30.7083
## geographic CRS: WGS 84
## First 10 features:
## platform temp network geometry
## 1 ioos:station:wmo:42395 21.05588 ADCP POINT (-90.792 26.404)
## 2 ioos:station:wmo:42067 14.63252 CenGOOS POINT (-88.649 30.043)
## 3 ioos:station:LUMCON:102 15.20056 LUMCON POINT (-90.6093 29.187)
## 4 ioos:station:NOAA.NDBC:CDRF1 15.18620 NDBC POINT (-83.029 29.136)
## 5 ioos:station:NOAA.NDBC:FWYF1 22.08396 NDBC POINT (-80.097 25.591)
## 6 ioos:station:NOAA.NDBC:KTNF1 14.14388 NDBC POINT (-83.593 29.819)
## 7 ioos:station:NOAA.NDBC:PTAT2 16.67026 NDBC POINT (-97.051 27.826)
## 8 ioos:station:NOAA.NDBC:SANF1 22.45478 NDBC POINT (-81.877 24.456)
## 9 ioos:station:NOAA.NDBC:SAUF1 15.75027 NDBC POINT (-81.265 29.857)
## 10 ioos:station:NOAA.NDBC:SGOF1 15.93136 NDBC POINT (-84.858 29.408)
# read in land outline
# we use st_read from sf package to read in vector file (here it is a .shp format)
sf_land <- st_read(file.path("data", "ne_50m_land", "ne_50m_land.shp"))
## Reading layer `ne_50m_land' from data source `/Users/shuang-zhang/Dropbox/Stuff/00-TAMU/REU/R_workshop/R_codes/data/ne_50m_land/ne_50m_land.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1420 features and 3 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -89.99893 xmax: 180 ymax: 83.59961
## geographic CRS: WGS 84
ggplot() +
geom_sf(data=sf_land, fill="#dedede", size=0.1, color="#aaaaaa") +
geom_sf(data=sf_temp, aes(fill=temp), shape=21, size=1, color="black", stroke=0.1, alpha = 0.7) +
labs(x = "Longitude",
y = "Latitude") +
scale_fill_continuous(name = "Temperature",
type = "viridis",
option = "magma") +
coord_sf(xlim = c(-180, 180), ylim = c(-90, 90), expand = FALSE) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
legend.title = element_text(size=10, vjust = 2),
legend.text = element_text(size=8),
legend.key.height = unit(10, units = "mm"),
legend.key.width = unit(3, units = "mm")
)
# change the projection to Mercator
sf_land_crop <- st_crop(sf_land, c(xmin = -180, ymin = -85, xmax = 180, ymax = 85))
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
sf_land_mercator <- st_transform(sf_land_crop, crs = 3857)
sf_temp_mercator <- st_transform(sf_temp, crs = 3857)
ggplot() +
geom_sf(data=sf_land_mercator, fill="#dedede", size=0.1, color="#aaaaaa") +
geom_sf(data=sf_temp_mercator, aes(fill=temp), shape=21, size=3, color="black", stroke=0.1, alpha = 0.7) +
labs(x = "x",
y = "y") +
scale_fill_continuous(name = "Temperature",
type = "viridis",
option = "magma") +
coord_sf(expand = FALSE, datum = 3857) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 12),
axis.text = element_text(size = 8),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
legend.title = element_text(size=10, vjust = 2),
legend.text = element_text(size=8),
legend.key.height = unit(10, units = "mm"),
legend.key.width = unit(3, units = "mm")
)
# crop the land to the Gulf of Mexico region
c_us_crop <- c(xmin = -105, ymin = 15, xmax = -75, ymax = 35)
sf_land_us <- st_crop(sf_land, c_us_crop)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
ggplot() +
geom_sf(data=sf_land_us, fill="#dedede", size=0.1, color="#aaaaaa") +
geom_sf(data=sf_temp, aes(fill=temp), shape=21, size=3, color="black", stroke=0.1, alpha = 0.7) +
labs(x = "Longitude",
y = "Latitude") +
scale_fill_continuous(name = "Temperature",
type = "viridis",
option = "magma") +
coord_sf(expand = FALSE) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
legend.title = element_text(size=10, vjust = 2),
legend.text = element_text(size=8),
legend.key.height = unit(10, units = "mm"),
legend.key.width = unit(3, units = "mm")
)
# Now we read in the global average sea surface temperature
# we use raster from raster package to read in raster file (here it is a .tif format)
rast_temp <- raster(file.path("data", "BO2_tempmean_ss_lonlat.tif"))
rast_temp
## class : RasterLayer
## dimensions : 2160, 4320, 9331200 (nrow, ncol, ncell)
## resolution : 0.08333333, 0.08333333 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : /Users/shuang-zhang/Dropbox/Stuff/00-TAMU/REU/R_workshop/R_codes/data/BO2_tempmean_ss_lonlat.tif
## names : BO2_tempmean_ss_lonlat
## values : -1.797733, 30.17863 (min, max)
# plot the raster (will see how we can plot this using the ggspatial package coupled with ggplot2)
# the simple way
plot(rast_temp)
# check crs
crs(rast_temp)
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
# check the resolution
res(rast_temp)
## [1] 0.08333333 0.08333333
# check the dimension
dim(rast_temp)
## [1] 2160 4320 1
# check the extent
extent(rast_temp)
## class : Extent
## xmin : -180
## xmax : 180
## ymin : -90
## ymax : 90
# Calculate Raster Min and Max Values
minValue(rast_temp)
## [1] -1.797733
maxValue(rast_temp)
## [1] 30.17863
# get the pixel value
rast_temp[1, 1]
##
## -1.716213
# calculate the mean temperature
cellStats(rast_temp, mean)
## [1] 13.82039
cellStats(rast_temp, sd)
## [1] 11.48417
sd(values(rast_temp), na.rm = TRUE)
## [1] 11.48417
# crop the raster
c_us_rast_crop <- c(xmin = -105, xmax = -75, ymin = 15, ymax = 35)
rast_temp_crop <- crop(rast_temp, y = c_us_rast_crop)
plot(rast_temp_crop)
# Here we use ggplot2 and ggspatial to plot the raster in the ggplot2 way
ggplot() +
geom_sf(data=sf_land_us, fill="#dedede", size=0.1, color="#aaaaaa") +
layer_spatial(data = rast_temp_crop, aes(fill = stat(band1))) +
scale_fill_continuous(name = "Temperature",
type = "viridis",
option = "magma",
na.value = NA) +
coord_sf(expand = FALSE) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
legend.title = element_text(size=10, vjust = 2),
legend.text = element_text(size=8),
legend.key.height = unit(10, units = "mm"),
legend.key.width = unit(3, units = "mm")
)
## Warning: Removed 40481 rows containing missing values (geom_raster).
# read in the shapefile of marine regions from International Hydrographic Organization (https://www.marineregions.org/gazetteer.php?p=details&id=4288)
sf_iho <- st_read(file.path("data", "World_Seas_IHO_v1", "World_Seas.shp"))
## Reading layer `World_Seas' from data source `/Users/shuang-zhang/Dropbox/Stuff/00-TAMU/REU/R_workshop/R_codes/data/World_Seas_IHO_v1/World_Seas.shp' using driver `ESRI Shapefile'
## Simple feature collection with 101 features and 3 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -85.47029 xmax: 180 ymax: 90
## geographic CRS: WGS 84
# extract the Gulf of Mexico
sf_gulf <- sf_iho[sf_iho$NAME == "Gulf of Mexico", ]
# check the extraction
ggplot() +
geom_sf(data=sf_land_us, fill="#dedede", size=0.1, color="#aaaaaa") +
geom_sf(data = sf_gulf, fill = "#9bf6ff")
# mask the seawater temperature with the shapefile of the Gulf of Mexico
rast_temp_crop
## class : RasterLayer
## dimensions : 240, 360, 86400 (nrow, ncol, ncell)
## resolution : 0.08333333, 0.08333333 (x, y)
## extent : -105, -75, 15, 35 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : BO2_tempmean_ss_lonlat
## values : 21.01431, 29.46556 (min, max)
rast_temp_gulf <- mask(rast_temp_crop, sf_gulf)
rast_temp_gulf
## class : RasterLayer
## dimensions : 240, 360, 86400 (nrow, ncol, ncell)
## resolution : 0.08333333, 0.08333333 (x, y)
## extent : -105, -75, 15, 35 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : BO2_tempmean_ss_lonlat
## values : 23.08181, 28.14442 (min, max)
plot(rast_temp_gulf)
# shrink the extent
rast_temp_gulf_trim <- trim(rast_temp_gulf)
plot(rast_temp_gulf_trim)
# plot the final product
ggplot() +
geom_sf(data=sf_land_us, fill="#dedede", size=0.1, color="#aaaaaa") +
layer_spatial(data = rast_temp_gulf_trim, aes(fill = stat(band1))) +
scale_fill_continuous(name = "Temperature",
type = "viridis",
option = "magma",
na.value = NA) +
coord_sf(expand = FALSE) +
theme_bw() +
theme(panel.grid.major = element_line(size = 0.5, linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
axis.title.x = element_text(margin = margin(t=10)),
axis.title.y = element_text(margin = margin(r=10)),
legend.title = element_text(size=10, vjust = 2),
legend.text = element_text(size=8),
legend.key.height = unit(10, units = "mm"),
legend.key.width = unit(3, units = "mm")
)
## Warning: Removed 11577 rows containing missing values (geom_raster).
# extract the raster points to a data.frame so that you can use it later
df_temp_gulf <- rasterToPoints(rast_temp_gulf_trim)
dt_temp_gulf <- as.data.table(df_temp_gulf)
rmarkdown::paged_table(dt_temp_gulf)
Notice
leaflet package to plot our map interactively# define the color palette
pal <- colorNumeric(
palette = "magma",
domain = values(rast_temp_gulf_trim),
na.color = NA
)
leaflet() %>%
addTiles() %>%
addRasterImage(x = rast_temp_gulf_trim, colors = pal, opacity = 1) %>%
addLegend("bottomright",
pal = pal,
values = values(rast_temp_gulf_trim),
title = "Temperature",
)
if else statement, the for loop, and string manipulation) partly due to the time limit and partly due to our focus on data analysis (instead of solving equations and analyzing texts). You can easily learn these stuff from online tutorials, such as here